A Machine Learning Based Framework for Verification and Validation of Massive Scale Image Data
نویسندگان
چکیده
Big data validation and system verification are crucial for ensuring the quality of big data applications. However, a rigorous technique for such tasks is yet to emerge. During the past decade, we have developed a big data system called CMA for investigating the classification of biological cells based on cell morphology that is captured in diffraction images. CMA includes a group of scientific software tools, machine learning algorithms, and a large scale cell image repository. We have also developed a framework for rigorous validation of the massive scale image data and verification of both the software systems and machine learning algorithms. Different machine learning algorithms integrated with image processing techniques were used to automate the selection and validation of the massive scale image data in CMA. An experiment based technique guided by a feature selection algorithm was introduced in the framework to select optimal machine learning features. An iterative metamorphic testing approach is applied for testing the scientific software. Due to the non-testable characteristic of the scientific software, a machine learning approach is introduced for developing test oracles iteratively to ensure the adequacy of the test coverage criteria. Performance of the machine learning algorithms is evaluated with the stratified N-fold cross validation and confusion matrix. We describe the design of the proposed framework with CMA as the case study. The effectiveness of the framework is demonstrated through verifying and validating the data set, software systems and algorithms in CMA.
منابع مشابه
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملDeveloping and validation of metamemory scale for adolescents
The purpose of this study was developing and validating the metamemory scale for adolescents in the academic context. The study was a mixed method research and benefitted from sequential exploratory type which in qualitative stage using triangulation method (aligning multiple data approach) holding four dimensions of a) collecting literature reviews related to metamemoey based on the theoretica...
متن کاملAutomatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique
The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...
متن کاملProstate cancer radiomics: A study on IMRT response prediction based on MR image features and machine learning approaches
Introduction: To develop different radiomic models based on radiomic features and machine learning methods to predict early intensity modulated radiation therapy (IMRT) response. Materials and Methods: Thirty prostate patients were included. All patients underwent pre ad post-IMRT T2 weighted and apparent diffusing coefficient (ADC) magnetic resonance imagi...
متن کاملVerification and Validation of Common Derivative Terms Approximation in Meshfree Numerical Scheme
In order to improve the approximation of spatial derivatives without meshes, a set of meshfree numerical schemes for derivative terms is developed, which is compatible with the coordinates of Cartesian, cylindrical, and spherical. Based on the comparisons between numerical and theoretical solutions, errors and convergences are assessed by a posteriori method, which shows that the approximations...
متن کامل